Preserving Edits When Perturbing Microdata for Statistical Disclosure Control Ntalie Shlomo, Ton De Waal
نویسندگان
چکیده
To protect individuals in microdata from the risk of re-identification, a general perturbative method called PRAM (the Post-Randomization Method) is sometimes used for masking records. This method adds “noise” to categorical variables by changing values of categories for a small number of records according to a prescribed probability matrix and a stochastic process based on the outcome of a random multinomial draw. Changing values of categorical variables, however, will cause fully edited and logical records in microdata to start failing edit constraints (i.e., logical rules) resulting in data of low utility. Also, an inconsistent record will target the record as having been perturbed for disclosure control and attempts can be made to unmask the data. Therefore, the perturbation process must take into account per-record micro edit constraints through post-editing which will ensure that perturbed microdata satisfy all edits. In addition, file-level macro edit constraints, which take the form of information loss measures, are also defined in order to ensure that the overall utility of the data will not be badly compromised given an acceptable level of disclosure risk. This paper will discuss methods for perturbing microdata using PRAM while minimizing micro and macro edit failures.
منابع مشابه
Disclosure Control Methods and Information Loss for Microdata
Statistical disclosure control (SDC) seeks to modify statistical data so that they can be published without giving away confidential information that can be linked to specific respondents. The challenge for SDC is to achieve this modification with minimum loss of the detail and accuracy sought by database users. SDC methods for microdata are usually known as masking methods, of which there is a...
متن کاملArgus: Software for Statistical Disclosure Control of Microdata
In recent years Statistics Netherlands has developed a prototype version of a software package, ARGUS, to protect microdata files against statistical disclosure. In 1995 the present prototype version of ARGUS, namely version 1.1, has been released. In this paper both the rules, based on checking low-dimensional combinations of values of so-called identifying variables, and the techniques, globa...
متن کاملCalculating minimum k-unsafe and maximum k-safe sets of variables for disclosure risk assessment of individual records in a microdata set
In the framework of disclosure control of a microdata set, an unique record is at risk of being identified. Even if a record is not unique in the microdata set, it may be considered risky if the frequency k of the cell, in which the record falls, is small. The notion of minimum unsafe combination introduced by Willenborg and de Waal (1996) is important in this respect. The purpose of this paper...
متن کاملStatistical Disclosure Control Methods for Census Frequency Tables
This paper provides a review of common statistical disclosure control (SDC) methods implemented at Statistical Agencies for standard tabular outputs containing whole population counts from a Census (either enumerated or based on a register). These methods include record swapping on the microdata prior to its tabulation and rounding of entries in the tables after they are produced. The approach ...
متن کاملStatistical Disclosure Control for Data Privacy Preservation
With the phenomenal change in a way data are collected, stored and disseminated among various data analyst there is an urgent need of protecting the privacy of data. As when individual data get disseminated among various users, there is a high risk of revelation of sensitive data related to any individual, which may violate various legal and ethical issues. Statistical Disclosure Control (SDC) ...
متن کامل